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Abstract — This paper considers the problem of detecting the 
support (sparsity pattern) of a sparse vector from random noisy 
measurements. Conditional power of a component of the sparse 
vector is defined as the energy conditioned on the component 
being nonzero. Analysis of a simplified version of orthogonal 
matching pursuit (OMP) called sequential OMP (SequOMP) 
demonstrates the importance of knowledge of the rankings of 
conditional powers. When the simple SequOMP algorithm is 
applied to components in nonincreasing order of conditional 
power, the detrimental effect of dynamic range on threshold- 
ing performance is eliminated. Furthermore, under the most 
favorable conditional powers, the performance of SequOMP 
approaches maximum likelihood performance at high signal-to- 
noise ratio. 

Index Terms — compressed sensing, convex optimization, lasso, 
maximum likelihood estimation, orthogonal matching pursuit, 
random matrices, sparse Bayesian learning, sparsity, thresholding 



I. Introduction 

Sets of signals that are sparse or approximately sparse with 
respect to some basis are ubiquitous because signal modeling 
often has the implicit goal of finding such bases. Using a 
sparsifying basis, a simple abstraction that applies in many 
settings is for 

y = Ax + d (1) 

to be observed, where A £ jjmxn jg known, x e M" is the 
unknown sparse signal of interest, and d G M™ is random 
noise. When m < n, constraints or prior information about x 
are essential to both estimation (finding vector x(y) such that 
||x — x|| is small) and detection (finding index set /(y) equal 
to the support of x). The focus of this paper is on the use of 
magnitude rank information on x — in addition to sparsity — in 
the support detection problem. We show that certain scaling 
laws relating the problem dimensions and the noise level are 
changed dramatically by exploiting the rank information in a 
simple sequential detection algorithm. 

The simplicity of the observation model ([T]l belies the 
variety of questions that can be posed and the difficulty of 
precise analysis. In general, the performance of any algorithm 
is a complicated function of A, x, and the distribution of d. 
To enable results that show the qualitative behavior in terms of 
problem dimensions and a few other parameters, we assume 
the entries of A are i.i.d. normal and describe x by its energy 
and its smallest-magnitude nonzero entry. 

We consider a partially-random signal model 



J 



1, 2, 



(2) 



where components of vector b are i.i.d. Bernoulli random 
variables with Pr(6j = 1) = 1 — Pr(&j = 0) = A > and s is 
a nonrandom parameter vector with all nonzero entries. The 
value represent the conditional power of the component Xj 



in the event that bj = 1. We consider the problem where the 
estimator knows neither bj nor Sj, but may know the order 
or rank of the conditional powers. In this case, the estimator 
can, for example, sort the components of s in an order such 
that 

> |S2| > ••• > \Sn\ > 0. (3) 



The main contribution of this paper is to show that this 
rank information is extremely valuable. A stylized application 
in which the conditional ranks can be known is random 
access communication as described in |[T|. Irrespective of this 
application, we show that when conditional rank information is 
available, a very simple detector, termed sequential orthogonal 
matching pursuit (SequOMP), can be effective. The SequOMP 
algorithm is a one-pass version of the well-known orthogonal 
matching pursuit (OMP) algorithm (see references below). 
Similar to several works in sparsity pattern recovery 121- 
we analyze the performance of SequOMP by estimating 
a scaling on the minimum number of measurements m to 
asymptotically reliably detect the sparsity pattern (support) 
of X in the limit of large random matrices A. Although the 
SequOMP algorithm is extremely simple, we show: 

« When the power orders are known and the signal-to-noise 
ratio (SNR) is high, the SequOMP algorithm exhibits 
a scaling in the minimum number of measurements for 
sparsity pattern recovery that is within a constant factor 
of the more sophisticated lasso and OMP algorithms. 
In particular, SequOMP exhibits a resistance to large 
dynamic ranges, which is one of the main motivations 
for using lasso and OMP. 

« When the power profile can be optimized, SequOMP can 
achieve measurement scaling for sparsity pattern recovery 
that is within a constant factor of optimal ML detection. 
This scaling is better than the best known sufficient 
conditions for lasso and OMP. 

The results are not meant to suggest that SequOMP is a good 
algorithm in any sense: other algorithms such as OMP can 
perform dramatically better The point is to concretely and 
provably demonstrate the value of conditional rank informa- 
tion. 

A. Related Work 

Under an i.i.d. Gaussian assumption on d, maximum likeli- 
hood estimation of x under a sparsity constraint is equivalent 
to finding sparse x such that ||y — Ax||2 is minimized. This 
is called optimal sparse approximation of y using dictionary 
A, and it is NP-hard |5|. Several greedy heuristics (matching 
pursuit |6| and its variants with orthogonalization |7|-|9| and 
iterative refinement ifTOl . ITTI ) and convex relaxations (basis 
pursuit |12|, lasso lfT3l . Dantzig selector iTITI . and others) 
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have been developed for sparse approximation, and under 
certain conditions on A and y they give optimal or near- 
optimal performance ifTSl - lflTi . Results showing that near- 
optimal estimation of x is obtained with convex relaxations, 
pointwise over compressible x and with high probability 
over some random ensemble for A, form the heart of the 
compressed sensing literature lfT8l - l|20l . Under a probabilistic 
model for x and certain additional assumptions, exact asymp- 
totic performances of several estimators are known 1 2 1 1 . 

Our interest is in recovery or detection of the support (or 
sparsity pattern) of x rather than the estimation of x. In 
the noiseless case of d = 0, optimal estimation of x can 
yield x = x under certain conditions on A; estimation and 
detection then coincide, and some papers cited above and 
notably [221 contain relevant results. In the general noisy case, 
direct analysis of the detection problem has yielded much 
sharper results. 

A standard formulation is to treat s as a nonrandom 
parameter vector and b as either nonrandom with weight 
k or random with a uniform distribution over the weight-A: 
vectors. The minimum probability of detection error is then 
attained with maximum likelihood (ML) detection. Sufficient 
conditions for the success of ML detection are due to Wain- 
wright |2]; necessary conditions based on channel capacity 
were given by several authors l|23l - l|26l , and conditions more 
stringent in many regimes and a comparison of results ap- 
pears in H. Necessary and sufficient conditions for lasso 
were determined by Wainwright |3|. Sufficient conditions for 
orthogonal matching pursuit (OMP) were given by Tropp and 
Gilbert |27| and improved by Fletcher and Rangan |28|. Even 
simpler than OMP is a thresholding algorithm analyzed in a 
noiseless setting in |29| and with noise in |4|. These results 
are summarized in Table U using terminology defined formally 
in Section In] 

B. Paper Organization 

The remainder of the paper is organized as follows. The 
setting is formalized in Section In particular, we define all 
the key problem parameters. Common algorithms and previous 
results on their performances are then presented in Section Hill 
We will see that there is a potentially-large performance gap 
between the simplest thresholding algorithm and the optimal 
ML detection, depending on the signal-to-noise ratio (SNR) 
and the dynamic range of x. Section |IV] presents a new 
detection algorithm, sequential orthogonal matching pursuit 
(SequOMP), that exploits knowledge of conditional ranks. 
Numerical experiments are reported in Section|V] Conclusions 
are given in Section IVII and proofs are relegated to the 
Appendix. 

II. Problem Formulation 
In the observation model y = Ax + d, let A e ]IJ'"X" and 
d e K" have i.i.d. 7V(0, 1/m) entries. This is a normalization 
under which the ratio of conditional total signal energy to total 
noise energy 

^^"^""^ ~ E[||d||2] 



simplifies to 



Let 



SNR(x) = ||x|| 



(5) 



/true = { je{l,2,...,n} : x,^0} 

denote the support of x. Using signal model Q, 

/true = { j e{l,2,...,n} : bj = l}. 

The sparsity level of x is A; = |/true|- 

An estimator produces an estimate / = /(y) of /true 
based on the observed noisy vector y. Given an estimator, 
its probability of erro^H 



Po 



Pr /^/trr 



(6) 



is taken with respect to randomness in A, noise vector d, and 
signal X. Our interest is in relating the scaling of problem 
parameters with the success of various algorithms. For this, 
we define the following criterion. 

Definition 1: Suppose that we are given deterministic se- 
quences m = m{n), A = A(?t.), and s — s(n) € R" that 
vary with n. For a given detection algorithm / = /(y), the 
probability of error pcrr is some function of n. We say that 
the detection algorithm achieves asymptotic reliable detection 
when pcri {n) 0. 

We will see that two key factors influence the ability to 
detect /true- The first is the total SNR defined above. The 
second is what we call the minimum-to-average ratio 

MAR(x) = ■^.^ .'r,' ^ ■ (7) 



|2 . 



llxP/fc 

Since /true has k elements, ||x|p/A; is the average of {|a 
j E /true}- Therefore, MAR(x) e (0, 1] with the upper limit 
occurring when all the nonzero entries of x have the same 
magnitude. 

Finally, we define the minimum component SNR to be 



SNR,rri„(x) 



E[||aj-a:j-|p 



E[||d|P] 



mm la;,- 



(8) 



where is the jth column of A and the second equality 
follows from the normalization of chosen for A and d. 
The quantity SNRmin(x) has a natural interpretation: The 
numerator is the signal power due to the smallest nonzero 
component in x, while the denominator is the total noise 
power. The ratio SNRniin(x) thus represents the contribution to 
the SNR from the smallest nonzero component of x. Observe 
that (|5]l and (|7]i show 

. , ,o 1 



SNRr,,in(x) 



mm \Xi 



■SNR(x) • MAR(x). (9) 



We will be interested in estimators that exploit minimal 
prior knowledge on x: either only knowledge of sparsity level 
(through k or A) or also knowledge of the conditional ranks 
(through the imposition of (O). In particular, full knowledge 
of s would change the problem considerably because the finite 
number of possibilities for x could be exploited. 

' An alternative to this definition of pcrr could be to allow a nonzero fraction 
of detection en'ors 1251 . 1261 . 
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III. Common Detection Methods 

In this section, we review several asymptotic analyses for 
detection of sparse signal support. These previous results hold 
pointwise over sequences of problems of increasing dimension 
n, i.e., treating x as an unknown deterministic quantity. That 
makes these results stronger than results that are limited to 
the model (|2]) where the hjS are i.i.d. Bernoulli variables. To 
reflect the pointwise validity of these results, they are stated 
in terms of deterministic sequences x, to, k, SNR, MAR, and 
SNRmin that depend on dimension n and are arbitrary aside 
from satisfying m — > oo and the definitions of the previous 
section. To simplify the notation, we drop the dependence of 
X, m and k on n, and SNR, MAR and SNR„ii„ on x(n). When 
the results are tabulated for comparison with each other and 
with the results of Section IIVI we replace k with An; this 
speciahzes the results to the model (|2]i. 



A. Optimal Detection with No Noise 

To understand the limits of detection, it is useful to first 
consider the minimum number of measurements when there 
is no noise. Suppose that k is known to the detector. With no 
noise, the observed vector is y = Ax, which will belong to 
one of J = ('^') subspaces spanned by k columns of A. If 
m > k, then these subspaces will be distinct with probability 
1 . Thus, an exhaustive search through the subspaces will reveal 
which subspace y belongs to and thus determine the support 
/true- This shows that with no noise and no computational 
limits, the scaling in measurements of 



> k 



(10) 



is sufficient for asymptotic reliable detection. 

Conversely, if no prior information is known at the detector 
other than x being fc-sparse, then the condition ( fTOb is also 
necessary. If m < k, then for almost all A, any k columns 
of A span K™. Consequently, any observed vector y = Ax 
is consistent with any support of weight k. Thus, the support 
cannot be determined without further prior information on the 
signal X. 

B. ML Detection with Noise 

Now suppose there is noise. Since x is an unknown de- 
terministic quantity, the probability of error in detecting the 
support is minimized by maximum likelihood (ML) detection. 
Since the noise d is Gaussian, the ML detector finds the k- 
dimensional subspace spanned by k columns of A containing 
the maximum energy of y. 

The ML estimator was first analyzed by Wainwright 111. He 
shows that there exists a constant C > such that if 



TO > C max 
= C max 



1 



MAR • SNR 

1 



k \og{n — k), k \og{n/k) 



SNRn 



log(n- A:),fclog(n/fc) ^ (11) 



then ML will asymptotically detect the correct support. The 
equivalence of the two expressions in ( fTTT i is due to (|9]l. Also, 



||4] Thm. 1] (generalized in [30" Thm. 1]) shows that, for any 
5 > 0, the condition 

2(1 -<5) 



TO > 



MAR • SNR 

2(1 -<5) 



k log(n ~ k) + k 
\og{n -k) + k, (12) 



n 

is necessary. Observe that when SNR • MAR — > oo, the lower 
bound (fT2] i approaches to > fc, matching the noise-free case 
([Tol l as expected. 

These necessary and sufficient conditions for ML appear in 
Table J] with smaller terms and the infinitesimal 6 omitted for 
simplicity. 

C. Thresholding 

The simplest method to detect the support is to use a 
thresholding rule of the form 



{ j e {l,2,...,n} : p{j) > 



(13) 



where /i > is a threshold parameter and p{j) is the 
correlation coefficient: 



J = 1, 2, 



Thresholding has been analyzed in 2|, ll29l . ODl . In particular, 
14| Thm. 2] is the following: Suppose 

2(l + (5)(l + SNR)A:L(fc,7i) 



> 



SNR • MAR 

2(l + (5)(l + SNR)L(fc,n) 

SNRinin 



where S > and 

L{k, n) 



0og(n - fc) + Vlog(fc) 



(14) 



(15) 



Then there exists a sequence of detection thresholds /i = fi{n) 
such that It achieves asymptotic reliable detection of the 
support. As before, the equivalence of the two expressions 
in ( fT4b is due to 

Comparing the sufficient condition (fT4] i for thresholding 
with the necessary condition (fT2] i. we see two distinct prob- 
lems with thresholding: 

> Constant offset: The scaling (fT4l i for thresholding shows 
a factor L{k, n) instead of log(ri — fc) in (fT2] i. It is easily 
verified that, for k/n £ (0, 1/2), 



log(ri — fc) < L{k,n) < 41og(ri, — fc), 



(16) 



so this difference in factors alone could require that 
thresholding use up to 4 times more measurements than 
ML for asymptotic reliable detection. 
Combining the inequality ( fTSI l with (fT4l i. we see that the 
more stringent, but simpler, condition 

8(1 + 5)(1 + SNR) 

TO > fclOgTl — fc) (17) 

SNR -MAR I y ■> 

is also sufficient for asymptotic reliable detection with 
thresholding. This simpler condition is shown in Table H] 
where we have omitted the infinitesimal 6 quantity to 
simplify the table entry. 
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finite SNR ■ MAR 


SNR ■ MAR oo 


Necessary for ML 


™ > MAR.SNR*=1°S(" '=) 

Fletcher et al. \± Thm. 1] 


?Ti > A: 
(elementary) 


Sufficient for ML 


MAR.SNR'=l°g(" ^) 

Wainwright 


m '> k 
(elementary) 


with best power profile 


logCi+SNRj'^'^Sl" 
From Theorem [B (Section |IV-D| 


m Q h- 
tit ^ ij r\j 

From Theorem [B (Section |lV-Ef 


Sufficient for SequOMP 
with known conditional ranks 


, 8(1+SNR'MAR) , , / , N 

From Theorem [T] (Section |IV-D| 


m > 8k log(n — k) 
From Theorem [T] (Section |lV-Ef 


Necessary and 
sufficient for lasso 


complicated; see | 3 | 


m > 2k log(n — A:) 
Wainwiight (5] 


SulScient for 
OMP 


unknown 


TYl ^ 2/l log(7l — k^ 

Fletcher and Rangan |28 | 


Sufficient for 
thresholding )13t 


mar.snr''=1°s(" ^) 
Fletcher et al. g] Thm. 2] 


> islR'^l°g(" ~ 



TABLE I 

Summary of results on measurement scalinos for asymptotic reliable detection for various detection algorithms . 
Only leading terms are shown. See body for definitions and additional technical limitations. 



• SNR saturation: In addition to the L{k,n)/ \og{n — k) 
offset, thresholding also requires a factor of 1+SNR more 
measurements than ML. This 1 + SNR factor has a natural 
interpretation as intrinsic interference: When detecting 
any one component of the vector x, thresholding sees the 
energy from the other n — 1 components of the signal as 
interference. This interference is distinct from the additive 
noise d, and it increases the effective noise by a factor 
of 1 + SNR. 

The intrinsic interference results in a large performance 
gap at high SNRs. In particular, as SNR — > oo, (fl4] | 
reduces to 

2{1 + 5)kL{k,n) 

m > . (18) 

MAR ^ ^ 

In contrast, ML may be able to succeed with a scaling 
m = 0{k) for high SNRs. 

D. Lasso and OMP Detection 

While ML has clear advantages over thresholding, it is not 
computationally tractable for large problems. One practical 
method is lasso [13|, also called basis pursuit denoising |12|. 
The lasso estimate of x is obtained by solving the convex 
optimization 



arg mm ( 1 1 y - 



Axil 



-/^l|x||i 



where /i > is an algorithm parameter that encourages 
sparsity in the solution x. The nonzero components of x can 
then be used as an estimate of /true- 

Wainwright |3 | has given necessary and sufficient conditions 
for asymptotic reliable detection with lasso. Partly because 
of freedom in the choice of a sequence of parameters 
the finite SNR results are difficult to interpret. Under certain 



conditions with SNR growing unboundedly with n, matching 
necessary and sufficient conditions can be found. Specifically, 
if m, n and k oo, with SNR • MAR oo, the scaling 



m > 2k log(n - k) + k + 1 



(19) 



is both necessary and sufficient for asymptotic reliable detec- 
tion. 

Another common approach to support detection is the 
OMP algorithm Q-EI- This was analyzed by Tropp and 
Gilbert |27| in a setting with no noise. This was generalized 
to the present setting with noise by Fletcher and Rangan ll28ll . 
The result is very similar to condition fT% : If m, n and 
k — oo, with SNR • MAR — > oo, a sufficient condition for 
asymptotic reliable recovery is 

TO > 2A:log(7i- A:). (20) 

The main result of ll28l also allows uncertainty in k. 

The conditions ( fT9] l and ( |20] | are both shown in Table |T] As 
usual, the table entries are simplified by including only the 
leading terms. 

The lasso and OMP scaling laws, fT% and ( |20] |. can be 
compared with the high SNR limit for the thresholding scaling 
law in ( fTSl ). This comparison shows the following: 

• Removal of the constant offset: The L{k,n) factor in 
the thresholding expression is replaced by a log(n — k) 
factor in the lasso and OMP scaling laws. Similar to 
the discussion above, this implies that lasso and OMP 
could require up to 4 times fewer measurements than 
thresholding. 

• Dynamic range: In addition, both the lasso and OMP 
methods do not have a dependence on MAR. This gain 
can be large when there is high dynamic range, i.e., MAR 
is near zero. 
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• Limits at high SNR: We also see from ([19]) and (|20li that 
both lasso and OMP are unable to achieve the scaling 
m = 0{k) that may be achievable with ML at high 
SNR. Instead, both lasso and OMP have the scaling 
m = 0(fclog(n — k)), similar to the minimum scaling 
possible with thresholding. 

E. Other Sparsity Detection Algorithms 

Recent interest in compressed sensing has led to a plethora 
of algorithms beyond OMP and lasso. Empirical evidence sug- 
gests that the most promising algorithms for support detection 
are the sparse Bayesian learning methods developed in the 
machine learning community Ii32i and introduced into signal 
processing applications in |f33l . with related work in ||34| . 
Unfortunately, a comprehensive summary of these algorithms 
is far beyond the scope of this paper Our interest is not in 
finding the optimal algorithm, but rather to explain qualitative 
differences between algorithms and to demonstrate the value 
of knowing conditional ranks a priori. 

IV. Sequential Orthogonal Matching Pursuit 

The results summarized in the previous section suggest a 
large performance gap between ML detection and practical 
algorithms such as thresholding, lasso and OMP, especially 
when the SNR is high. Specifically, as the SNR increases, the 
performance of these practical methods saturates at a scaling 
in the number of measurements that can be significantly higher 
than that for ML. 

In this section, we introduce an OMP-like algorithm, which 
we call sequential orthogonal matching pursuit, that under 
favorable conditions can break this barrier. Specifically, in 
some cases, the performance of SequOMP does not saturate 
at high SNR. 

A. Algorithm: SequOMP 

Given a received vector y, threshold level /i > 0, and detec- 
tion order tt (a permutation on {1, 2, . . . , n}), the algorithm 
produces an estimate /g of the support Itmc with the following 
steps: 

1) Initialize the counter j — 1 and set the initial support 
estimate to empty: /(O) = {0}. 

2) Compute P(j)a7r(j) where P(j) is the projection op- 
erator onto the orthogonal complement of the span of 
{a,(,),7r(^)e/(j-l)}. 

3) Compute the correlation 



lP(j)a.(,.)|12||P0-)y||2- 



4) If p{j) > /i, add the index Tr{j) to I{j — 1). That is, 
Hj) = Hj - 1) U {j}. Otherwise, set /(j) = /(j - 1). 

5) Increment j=j + l-lfj<n return to step 2. 

6) The final estimate of the support is Jg = I{n). 

The SequOMP algorithm can be thought of as an iterative 
version of thresholding with the difference that, after a nonzero 
component is detected, subsequent correlations are performed 



only in the orthogonal complement to the corresponding 
column of A. The method is identical to the standard OMP 
algorithm of Q-IH, except that SequOMP passes through the 
data only once, in a fixed order. For this reason, SequOMP is 
computationally simpler than standard OMP. 

As simulations will illustrate later, SequOMP generally 
has much worse performance than standard OMP. It is not 
intended as a competitive practical alternative. Our interest 
in the algorithm lies in the fact that we can prove positive 
results for SequOMP. Specifically, we will be able to show that 
this simple algorithm, when used in conjunction with known 
conditional ranks, can achieve a fundamentally better scaling 
at high SNRs than what has been proven is achievable with 
methods such as lasso and OMP. 

B. Sequential OMP Performance 

The analyses in Section |lll] hold for deterministic vectors 
X. Recall the partially-random signal model ^ where hj 
is a Bernoulli(A) random variable while the value of Xj 
conditional on Xj being nonzero remains deterministic; i.e., 
Sj is deterministic. 

Let pj denote the conditional energy of Xj, conditioned on 
hj = 1 (i.e., j e /true). Then 



Pj 



j = 1, 2, . . . , n. 



(21) 



We will call the power profile. Since Pr(fej = 1) = A 

for every j, the average value of SNR(x) in ^ is given by 

n 

SNR = A^Pj. (22) 

Also, in analogy with MAR(x) and SNRinin(x) in Q and ((SI, 
define 

SNRmin = minpj, 

j 

Xn . AnSNRinin 
MAR = mm pj = 



SNR 



SNR 



Note that the power profile pj and the quantities SNR, SNRmin 
and MAR as defined above are deterministic. 

To simplify notation, we henceforth assume tt is the identity 
permutation, i.e., the detection order in SequOMP is simply 
(1, 2, . . . , n). A key parameter in analyzing the performance 
of SequOMP is what we will call the minimum signal-to- 
interference and noise ratio (MSINR) 



7 



min Pi /a {£), 

— l,...,n 



where {£) is given by 

d^{£) = 1 + A 



J2 

j=i+i 



(23) 



(24) 



The parameters 7 and a'^{£) have simple interpretations: Sup- 
pose SequOMP has correctly detected bj for all j < (. Then, 
in detecting bi, the algorithm sees the noise d with power 
E[||d|p] — 1 plus, for each component j > I, an interference 
power pj with probability A. Hence, '(P'it) is the total average 
interference power seen when detecting bg, assuming perfect 
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cancellation up to that point. Since the conditional power of 
xi is pg, the ratio pi/a^{£) in ( |23] ) represents the average 
SINR seen while detecting component £. The value 7 is the 
minimum SINR over all n components. 

Theorem 1: Let A = A(n), m — m{n), and the power 
profile {pjY^^^ — {Pj{n)y^^^ be deterministic quantities that 
all vary with n satisfying the limits 



m — An 



An — > 00, (1 — A)n— ^00, and 7 — > 0. 



Also, assume the sequence of power profiles satisfies the limit 



lim 



max log(n)(7 '^{i)y^Pj = 0. 



Finally, assume that for all n, 

2(l + d)L{n,X) 



m > 



An, 



7 



(25) 



(26) 



for some 5 > and L(n, A) defined in ( fTSl l. Then, there exists 
a sequence of thresholds, p = p{n), such that SequOMP 
will achieve asymptotic reliable detection. The sequence of 
threshold levels can be selected independent of the sequence 
of power profiles. 

Proof: See Appendix lAl ■ 
The theorem provides a simple sufficient condition on the 
number of measurements as a function of the MSINR 7, 
probability A, and dimension n. The condition dZSl l is some- 
what technical; we will verify its validity in examples. The 
remainder of this section discusses some of the implications 
of this theorem. 

C. Most Favorable Detection Order with Known Conditional 
Ranks 

Suppose that the ordering of the conditional power levels 
{PjljLi is known at the detector, but possibly not the values 
themselves. Reordering the power profile is equivalent to 
changing the detection order, so we seek the most favorable 
ordering of the power profile. Since ct^(^) defined in (|24] | 
involves the sum of the tail of the power profile, the MSINR 
defined in (|23] | is maximized when the power profile is non- 
increasing: 



Pi 



> 



P2 



> 



> 



Pn 



= SNR,^ 



(27) 



In other words, the best detection order for SequOMP is from 
strongest component to weakest component. 

Using ( |27] |. it can be verified that the MSINR 7 is bounded 
below by 

SNRniin SNR • MAR 



7 > 



(28) 



l + AnSNR,nin An(l + SNR-MAR) 

Furthermore, the sufficiency of the scaling (|26] | shows that 

^ 2(l + 5)An(l + SNR-MAR) . ^. ^ 

m > — ^ -L(n, A) + An (29) 

SNR- MAR ^ ' ' 

is sufficient for asymptotic reliable detection. This expression 
is shown in Table U with the additional simplification that 
L(n,A) < 41og(n(l - A)) for A € (0,1/2). To keep the 
notation consistent with the expressions for the other entries 



in the table, we have used k for An, which is the average 
number of non-zero entries of x. 
When SNR ^ 00, (|29]l simpHfies to 



m > 2(1 + (5)Ani(n, A) + An. 



(30) 



This is identical to the lasso and OMP performance except for 
the factor L{X, n)/ log((l — A)n), which lies in (0, 4) for A £ 
(0, 1/2). In particular, the minimum number of measurements 
does not depend on MAR; therefore, similar to lasso and OMP, 
SequOMP can theoretically detect components that are much 
below the average power at high SNRs. More generally, we 
can say that knowledge of the conditional ranks of the powers 
enable a very simple algorithm to achieve resistance to large 
dynamic ranges. 

D. Optimal Power Shaping 

The MSINR lower bound in ( |28] | is achieved as n — > 00 and 
the power profile is constant (all pj's are equal). Thus, opposite 
to thresholding, a constant power profile is in some sense the 
worst power profile for a given SNRmin for the SequOMP 
algorithm. 

This raises the question: What is the most favorable power 
profile? Any power profile maximizing the MSINR 7 subject 
to a constraint on total SNR (l22T i will achieve the minimum 
in ( |23] ) for every £ and thus satisfy 



7 1 + A ^ 



1,2, 



The solution to OlT l and {22\ is given by 



7opt(l +7optA)"" 



1, 2, 



(31) 



(32a) 



where 



7opt 



(1 + SNR) 



l/n _ 



A?i 



log(l + SNR) (32b) 



and the approximation holds for large n0 Again, some algebra 
shows that when A is bounded away from zero, the power 
profile in (|32] | will satisfy the technical condition ( l25T l when 
log(l + SNR) = o(n/log(n)). 

The power profile ( |32a| i is exponentially decreasing in 
the index order £. Thus, components early in the detection 
sequence are allocated exponentially higher power than com- 
ponents later in the sequence. This allocation insures that early 
components have sufficient power to overcome the interference 
from all the components later in the detection sequence that 
are not yet cancelled. 

Substituting ( I32bb into ( |26] |, we see that the scaling 



(33) 



^ 2(1 + (5)L(n, A) , 
m > -4 — -..l/ An + An 



log(l + SNR) 

is sufficient for SequOMP to achieve asymptotic reliable 
detection with the best-case power profile. This expression is 
shown in Table HI again with the additional simplification that 

L(n,A) < 41og(n(l - A)) for A e (0,1/2). 



-The solution )32t is the d 
Section IrV-Gl see {M . 



case of a more general result in 
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E. SNR Saturation 

As discussed earlier, a major problem with thresholding, 
lasso, and OMP is that their performances "saturate" with high 
SNR. That is, even as the SNR scales to infinity, the minimum 
number of measurements scales as m = 8(Artlog((l — \)n). 
In contrast, optimal ML detection can achieve a scaling m = 
0{Xn), when the SNR is sufficiently high. 

A consequence of ( [33] ) is that SequOMP with exponential 
power shaping can overcome this barrier. Specifically, if we 
take the scaling of SNR = 8 (An) in ( |33] ), apply the bound 
L{n,X) < 41og(n(l - A)) for A £ (0, 1/2), and assume that 
A is bounded away from zero, we see that asymptotically, 
SequOMP requires only 



> 9A?- 



(34) 



measurements. In this way, unlike thresholding and lasso, 
SequOMP is able to succeed with scaling m — 0{\n) when 
SNR oo. 

F. Power Shaping with Sparse Bayesian Learning 

The fact that power shaping can provide benefits when 
combined with certain iterative detection algorithms confirms 
the observations in the work of Wipf and Rao ||35l . That 
work considers signal detection with a certain sparse Bayesian 
learning (SBL) algorithm. They show the following result; 
Suppose X has k nonzero components and pi, i = 1,2, k, 
is the power of the ith largest component. Then, for a given 
measurement matrix A, there exist constants Vi> 1 such that 
if 

Pi > i^iPi-i, i = 2, 3, . . . , fc, (35) 
the SBL algorithm will correctly detect the sparsity pattern of 

X. 

The condition ( [35T l shows that a certain growth in the powers 
can guarantee correct detection. The parameters i/^ however 
depend in some complex manner on the matrix A, so the 
appropriate growth is difficult to compute. They also provide 
strong empirical evidence that shaping the power with cer- 
tain profiles can greatly reduce the number of measurements 
needed. 

The results in this paper add to Wipf and Rao's observations 
showing that growth in the powers can also assist SequOMP. 
Moreover, for SequOMP, we can explicitly derive the optimal 
power profile for certain large random matrices. 

This is not to say that SequOMP is better than SBL. In fact, 
empirical results in |33| suggest that SBL will outperform 
OMP, which will in turn do better than SequOMP. As we 
have stressed before, the point of analyzing SequOMP here is 
that we can derive concrete analytic results. These results may 
provide guidance for more sophisticated algorithms. 

G. Robust Power Shaping 

The above analysis shows certain benefits of SequOMP 
used in conjunction with power shaping. However, these 
gains are theoretically only possible at infinite block lengths. 
Unfortunately, when the block length is finite, power shaping 
can actually reduce the performance. 



The problem is that when a nonzero component is not de- 
tected in SequOMP, that component's energy is not cancelled 
out and remains as interference for all subsequent components 
in the detection sequence. With power shaping, components 
early in the detection sequence have much higher power 
than components later in the sequence, so an early missed 
detection can make subsequent detection difficult. As block 
length increases, the probability of missed detection can be 
driven to zero. But at any finite block length, the probability 
of a missed detection early in the sequence will always be 
nonzero. 

The work ll36l observed a similar problem when successive 
interference cancellation is used in a CDMA uplink. To miti- 
gate the problem, f3E\ proposed to adjust the power allocations 
to make them more robust to detection errors early in the 
detection sequence. The same technique, which we will call 
robust power shaping, can be applied to SequOMP as follows. 

The condition (l3Tl l is motivated by maintaining a constant 
MSINR through the detection process, assuming all compo- 
nents with indexes j < £ have been correctly detected and 
subtracted. An alternative, following [36 1, is to assume that 
some fixed fraction 6 E [0,1] of the energy of components 
early in the detection sequence is not cancelled out due to 
missed detections. We will call 9 the leakage fraction. With 
nonzero leakage, the condition OTT l is replaced by 



3=1 j=e+i I 



^= 1, 2, 



(36) 

For given 7, A, and B, (|36] | in a system of linear equations that 
determine the power profile one can vary 7 until the 

power profile provides the desired SNR according to (|22] |. 

A closed-form solution to ( |36] | provides some additional 
insight. Adding and subtracting SNR inside the parentheses in 
while also using (l22T l yields 



Pi = 7 l + SNR-A^p,+0A^Pj+A pj 



=0 



which can be rearranged to 

(l + 7A)p, - 7|^l + SNR-(l-0)A^p,j. (37) 

Using standard techniques for solving linear constant- 
coefficient difference equations. 



Pi 



where 



and 



SNR (1 - C)C^^^ 

~\ i~C" 

. _ 1 + 761 A 
■ " I+7A 



7 = T 



( 1+£snrA 

\^ 1+SNR j 



1/n 



/ l+flSNR \ _ n 

\ 1 + SNR J ^ 



(38a) 



(38b) 



(38c) 
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Notice that 9 < 1 implies ^ < 1, so the power profile ( 13 8 al l 
is decreasing as in the case without leakage in Section HV-DI 
Setting 9 = recovers ( [32] i. 

V. Numerical Simulation 

A. Threshold Settings 

The performances of the thresholding and SequOMP al- 
gorithms depend on the setting of the threshold level ^. In 
the theoretical analysis of Theorem [T] an ideal threshold 
is calculated for the limit of infinite block length, which 
guarantees perfect detection of the support. In simulations with 
finite block lengths, it is more reasonable to set the threshold 
based on a desired false alarm probability. A false alarm is 
the event that the algorithm falsely detects that a component 
is nonzero when it is not. For the thresholding algorithm in 
Section HITCl or the SequOMP algorithm in Section HV-Al the 
false alarm probability is 

PFA = Pr (j e i \ j ^ /true) 

= Pr {p{j) > H\j <^ /true) , 

which is the probability that the correlation p{j) exceeds the 
threshold /i when bj =0. 

In the simulations below, we adjust the threshold fi by 
trial and error to achieve a fixed false alarm probability 
(typically ppA ~ 10^'^), and then measure the missed detection 
probability given by 

PMD = Pr (^j ^ / I j e /true) ■ 

The missed detection probability is averaged over all j € /true- 

B. Evaluation of Bounds 

We first compare the actual performance of the SequOMP 
algorithm with the bound in Theorem [T] Fig. [T] plots the 
simulated missed detection probability for using SequOMP 
at various SNR levels, probabilities of nonzero components A, 
and numbers of measurements m. In all these simulations, the 
number of components was fixed to n = 100. The false alarm 
probability was set to ppA = 10~^. The robust power profile 
of Section HV-GI is used with a leakage fraction 9 = 0.1. 

The dark line in Fig. [T] represents the number of measure- 
ments m for which Theorem [1] would theoretically guarantee 
reliable detection of the support at infinite block lengths. To 
apply the theorem, we used the MSINR 7 = 7(6') in (|38c| i. 
At the block lengths considered in this simulation, the missed 
detection probability at the theoretical sufficient condition is 
small, typically between 2 and 10%. Thus, even at moderate 
block lengths, the theoretical bound in Theorem[T]can provide 
a good estimate for the number of measurements for reliable 
detection. 

C. SequOMP vs. Thresholding 

Fig. |2] compares the performances of thresholding and 
SequOMP with power shaping. In the simulations, n = 100, 
A = 0.1, and the total SNR is 20 dB. The number of 
measurements m was varied, and for each m, the missed 



SNR = SNR = 10 SNR. 20 




0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2 0.05 0.1 0.15 0.2 

Activity prob }. 



Fig. 1. SequOMP with power shaping: Each colored bar represents the 
SequOMP algorithm's missed detection probability as a function of the 
number of measurements m, with different bars showing different activity 
probabihties A and SNR levels. The missed detection probabilities were 
estimated with 1000 Monte Carlo trials. The number of users is set to 
n = 100, the false alarm probability is ppA = 10""^. The power shaping is 
performed with a leakage fraction of 9 = 0.1. The dark black line shows the 
theoretical number of measurements m required in Theorem[T]with 7 = 7(6) 
in (3Sc\ . 




50 100 150 200 250 

Num measurements, m 



Fig. 2. Missed detection probabilities for various detection methods and 
power profiles. The number of users is n = 100, SNR = 20 dB, the activity 
probability is A = 0.1, and the false alarm rate is pp\ = 10^^. For the 
SequOMP algorithm with power shaping, the leakage fraction was set to 6 = 
0.1. 

detection probability was estimated with 1000 Monte Carlo 
trials. 

As expected, thresholding requires the most number of 
measurements. For a missed detection rate of 1%, Fig. |2] 
shows that thresholding requires approximately m « 210 
measurements. In this simulation of thresholding, the power 
profile is constant. Employing SequOMP but keeping the 
power profile constant decreases the number of measurements 
somewhat to m w 170 for a 1% missed detection rate. 
However, using SequOMP with power shaping decreases the 
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Fig. 3. Power shaping with OMP. Plotted is the missed detection probabilities 
with OMP using a constant power profile, and power shaping wiht a leakage 
fraction set to 9 = 0.1. Other simulation assumptions are identical to Fig. |2] 



number of measurements by more than a factor of two to 
TO « 95. Thus, at least at high SNRs, SequOMP may provide 
significant gains over simple thresholding. 

D. OMP with Power Shaping 

As discussed earlier, although SequOMP can provide gains 
over thresholding, its performance is typically worse than 
OMP, even if SequOMP is used with power shaping. (Our 
interest in SequOMP is that it is simple to analyze.) 

While we do not have any analytical result, the simulation in 
Fig. |3] shows that power shaping provides gains with OMP as 
well. Specifically, when the power profile is constant, to w 85 
measurements are needed for a missed detection probability 
of 1%. This number is slightly lower than that required by 
SequOMP, even when SequOMP uses power shaping. When 
OMP is used with power shaping, the number of measurements 
decreases to m « 65. 

VI. Conclusions 

Methods such as OMP and lasso, which are widely used in 
sparse signal support detection problems, exhibit advantages 
over thresholding but still fall far short of the performance 
of optimal (ML) detection at high SNRs. Analysis of the 
SequOMP algorithm has shown that knowledge of conditional 
rank of signal components enables performance similar to 
OMP and lasso at a lower complexity. Furthermore, in the 
most favorable situations, conditional rank knowledge changes 
the fundamental scaling of performance with SNR so that 
performance no longer saturates with SNR. 

Appendix 
Proof of Theorem[T] 

A. Proof Outline 

At a high level, the proof of Theorem [T] is similar to the 
proof of ID Thm. 2], the thresholding condition ( fTTI ). One 



of the difficulties in the proof is to handle the dependence 
between random events at different iterations of the SequOMP 
algorithm. To avoid this difficulty, we first show an equivalence 
between the success of SequOMP and an alternative sequence 
of events that is easier to analyze. After this simplification, 
small modifications handle the cancellations of detected vec- 
tors. 

Fix n and define 

/truoO') = {^ : /truo,^< j}, 

which is the set of elements of the true support with indices 
(. < j. Observe that /truc(O) = {0} and /truc(n-) = /true- 

Let Ptiuc(i) be the projection operator onto the orthogonal 
complement of {a£, £ £ /truo(j — 1)}, and define 



Ptruo(j) 



|a;.Pt 



.(j)yl' 



|Pt™c(j)a,||2||Ptrue(j)yp- 



(39) 



A simple induction argument shows that SequOMP correctly 
detects the support if and only if, at each iteration j, the vari- 
ables P(j) and p{j) defined in the algorithm are equal 
to /truc(j)' PtruoO') and ptruc(j), respectively. Therefore, if 
we define 

I = {j ■■ Ptruc(j) > }, (40) 

then SequOMP correctly detects the support if and only if 
I = ^tiuo- In particular. 



Pcii{n) Pr (/ 7^ It, 



To prove that Pcii{n) — > it suffices to show that there 
exists a sequence of threshold levels /i(n) such the following 
two limits 



lim inf min 

n-5-oo je/truo(n) 

lim sup max 



Ptruc(i) 
Ptruc(j) 



> 1, 



< 1, 



(41) 



(42) 



hold in probability. The first limit (ITTT l ensures that all the 
components in the true support will not be missed and will be 
called the zero missed detection condition. The second limit 
(l42l i ensures that all the components not in the true support 
will not be falsely detected and will be called the zero false 
alarm condition. 

Set the sequence of threshold levels as follows. Since (5 > 0, 
we can find an e > such that 



{l + 5)>{l + ef 



For each n, let the threshold level be 



M = (1 + e) 



log(n(l - A)) 
to — An 



(43) 



(44) 



The asymptotic lack of missed detections and false alarms 
with these thresholds are proven in Appendices |D] and IE] 
respectively. In preparation for these sections. Appendix |B] 
reviews some facts concerning tail bounds on Chi-squared 
and Beta random variables and Appendix |C] performs some 
preliminary computations. 
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B. Chi-Squared and Beta Random Variables 

The proof requires a number of simple facts concerning 
chi-squared and beta random variables. These variables are 
reviewed in |37|. We will omit all the proofs in this subsections 
as they can be proved along the lines of the calculations in 

ii. 

A random variable u has a chi-squared distribution with r 
degrees of freedom if it can be written as u = •^i^' where 

Zi are i.i.d. A/'(0, 1). 

Lemma 1: Suppose x G M'' has a Gaussian distribution 
7V(0, a^Ir). Then: 

(a) ||x||^/(T^ is chi-squared with r degrees of freedom; and 

(b) if y is any other r-dimensional random vector that is 
nonzero with probability one and independent of x, then 
the variable 

Ix'yP 



is a chi-squared random variable with one degree of 
freedom. 

The following two lemmas provide standard tail bounds. 
Lemma 2: Suppose that for each n, {x^"-*}"^]^ is a set of 

(n) 

Gaussian random vectors with each x^ spherically symmetric 
in an mj (n)-dimensional space. The variables may be depen- 
dent. Suppose also that E||x^"-'|p = 1 and 

lim log(n)/m,nin(?i) = 

n— >cxD 

where 



rrir, 



.in) 



min mj(n). 
i=i,...,n 



Then the limits 

lim max | 

n — ^oo J — l,....n 



.(")||2 _ 



lim min 



hold in probability. 

Lemma 3: Suppose that for each n, is a set 

of chi-squared random variables, each with one degree of 
freedom. The variables may be dependent. Then 



u) 

lim sup max — - 
n^oo i=i,---.n 21og(n) 



< 1, 



(45) 



where the limit is in probability. 

The final two lemmas concern certain beta distributed 
random variables. A real-valued scalar random variable w 
follows a Beta(r, s) distribution if it can be written as w = 
Ur/{ur + Vg), where the variables Ur and Vg are independent 
chi-squared random variables with r and s degrees of freedom, 
respectively. The importance of the beta distribution is given 
by the following lemma. 

Lemma 4: Suppose x and y are independent random 
r-dimensional random vectors with x being spherically- 
symmetrically distributed in W and y having any distribution 
that is nonzero with probability one. Then the random variable 



The following lemma provides a simple expression for the 
maxima of certain beta distributed variables. 

Lemma 5: For each n, suppose {wj"-*}"^-!^ is a set of 

random variables with wj"'' having a Beta(l,mj(n) — 1) 
distribution. Suppose that 

lim log(n)/m„iin(n) = 0, lim m^^^in) = oo (46) 

n— >-oo n-^oo 

where 
Then, 



(n) = min mj{n). 

j=l,...,n 



r nij{n) („) 

limsup max ^ , , . w) < 1 
„_).oo i=i.---." 21og(n) ■' 

in probability. 

C. Preliminary Computations and Technical Lemmas 

We first need to prove a number of simple but technical 
bounds. We begin by considering the dimension nii defined 



nil = dim(range(Ptruo(»)))- 

Our first lemma computes the limit of this dimension. 
Lemma 6: The following limit 



lim 



= 1 



(47) 



(48) 



n-i-oo 1=1,. ...n TO — An 

holds in probability and almost surely. The deterministic limits 



lim MM. lin.MiziM.0 



(49) 



n~^oo TO — An ri->oo TO — Afl 

also hold. 

Proof: Recall that Ptrue(«) is the projection onto the 
orthogonal complement of the vectors aj with j G /truc(* — !)■ 
With probability one, these vectors will be linearly indepen- 
dent, so Ptiuc(*) will have dimension to— |/truc(*~ 1)1- Since 
/truo(*) is increasing with i. 



min TOj = TO— max |/truo(*— 1)| 



m - |/truc(?^ - 1)1- 



(50) 



Since each user is active with probability A and the activities 
of the users are independent, the law of large numbers shows 
that 

lim = 1 

n^oo A(n — 1) 

in probability and almost surely. Combining this with ( l50l ) 
shows ( l48T l. 

We next show ( |49] l. Since the hypothesis of the theorem 
requires that An, (1 — A)n and to — An all approach infinity, 
the fractions in ( l49l l are eventually positive. Also, from ( fTSI ). 

L{X,n) < max{log(An), log((l — A)n)}. Therefore, from 



1 



|x|P||y||2 



TO — An 
< 



c{log(An),log((l-A)n)} 



max{log(An), log((l - A)n)} < ^ ^ 0, 
is independent of x and follows a Beta(l,r — 1) distribution, where the last step is from the hypothesis of the theorem. I 
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Next, for each i ^ 1, . . . ,n, define the residual vector, 

Bi = Ptruo(j)(y - ajXi). (51) 

Observe that 

e,; = Ptruc(i)(y - ai.Ti) 



(a) 



Ptmc(j) d + ^aj 



Ptruc(j) d + ^SLj 



(52) 



where (a) follows from ([T]l and (b) follows from the fact that 
Ptrue(«) is the projection onto the orthogonal complement of 
the span of all vectors a.j with j < i and Xj ^ 0. 

The next lemma shows that the power of the residual vector 
is described by the random variable 



(53) 



j=i+i 



Lemma 7: For alH = 1, . . . , the residual vector e^, con- 
ditioned on the modulation vector x and projection Ptruc(*)' 
is a spherically symmetric Gaussian in the range space of 
Ptruc(*) with total variance 

E(||e,|nx)=^a2(z), (54) 

where rrii and cr^(i) are defined in ( |47] | and (|53]) . respectively. 
Proof: Let 

j>i 

SO that e,: = Ptruo(*)'^i- Since the vectors a.j and d have 
Gaussian A/'(0, 1/rn/m) distributions, for a given vector x, 
Vi must be a zero-mean white Gaussian vector with total 
variance E||v,i|p = cr^(i). Also, since the operator Ptruo(*) 
is a function of the components Xi and vectors for £ < i, 
Ptrue(*) is independent of the vectors d and a.j, j > i, and 
therefore independent of v^. Since Ptrue(*) is a projection 
from an m-dimensional space to an to^ -dimensional space, e^, 
conditioned on the modulation vector x, must be spherically 
symmetric Gaussian in the range space of Ptiuc(*) with total 
variance satisfying (|54] |. ■ 

Our next lemma requires the following version of the well- 
known Hoeffding's inequality. 

Lemma 8 (Hoeffding's Inequality): Suppose z is the sum 



z ^ zq +y^^zi 



where zq is a constant and the variables z,; are independent 
random variables that are almost surely bounded in some 
interval Zi E [ai, bi]. Then, for all e > 0, 

Pr (z - E(z) > e) < exp f 



C 



where 



Proof: See 11381. ■ 
Lemma 9: Under the assumptions of Theorem [T] the limit 

limsup max ^ < 1 

n^oo i=l,....n a''[i) 

holds in probability. 

Proof: Let z{i) = a'^{i)/a'^{i). From the definition of 
(T^(i) in ( |53l l, we can write 



1 " 



j=i+l 



where z{i,j) = \xj\'^ /a'^(i) for j > i. 

Now recall that in the problem formulation, each user is 
active with probability A, with power \xj\'^ = pj conditioned 
on when the user being active. Also, the activities of different 
users are independent, and the conditional powers pj are 
treated as deterministic quantities. Therefore, the variables 
z{i,j) are independent with 



_ j pjla'^ii), with probabihty A; 



0: 



with probabihty 1 — A, 



for 2 > i. Combining this with the definition of CT^(i) in ( l24b . 
we see that 



E(s(i)) 



= 1. 



Also, for each j > i, we have the bound 

z{t,j)E[o,pj/d'it)]. 

So for use in Hoeffding's Inequality (Lemma [8]l, define 

n 

where dependence of the power profile and a{i) on n is 
implicit. Now define 

c„ = max \Qg{n)C{i,n), 

i—l.....n 

SO that C{i,n) < c„/log(n) for all i. Hoeffding's Inequality 
(Lemma [8]) now shows that for all i < n, 

Pr(z(i) > 1 + e) < exp {-2e'^/C{i, n)) 

< exp (-2e^ log(n)/c„) . 

Using the union bound. 



lim Pr max z{i) > 1 + e 

n^oo \ J — l,...,n 



< lim n exp — 



2e2 log(n) 



= lim n^-^' = 0. 

The final step is due to the fact that the technical condition 
( [25] l in the theorem implies c„ 0. This proves the lemma. 
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D. Missed Detection Probability 

Consider any j € /true- Using dSTT l to rewrite ( [39] l along 
with some algebra shows 



PtrucC?) = 



|a'Ptruc(j)y|' 



> 



||Pt..ucO>|P||Ptruc(j)y,|P 

|aj(a;jPtrue(j)aj +ej-)p 

1 1 J^truc rue 
5j ~ 2^ZjSj + Zj 



where 



Sj + 27^75- + 1 ' 

|x,f ||Pt.uc(j)a,||^ 
l|e,||2 
|a;-Pt,uc(j)ejf 
||Pt™e(j)a,||2||e,|P 



(55) 

(56) 
(57) 



Define 



mm Sj, 

J 6 /true 



max z, 

iG/tr„c 



We first start with Smin. Conditional on x and Ptiuc(j)^ 
Lemma Q shows that each is a spherically-symmetrically 
distributed Gaussian on the -dimensional range space of 
Ptruc(j)- Since there are asymptotically An elements in /true, 
Lemma |2] along with ( |49] l show that 

(58) 



lim max 



11111 lllO,^ Co 



= 1, 



where the limit is in probability. Similarly, Ptruo(j)aj is 
also a spherically-symmetrically distributed Gaussian in the 
range space of Ptruc(j)- Since Ptruc(j) is a projection 
from an m-dimensional space to a -dimensional space 
and Ella^lp = 1, we have that E||Ptruc(j)aj||^ = nij/m. 
Therefore, Lemma |2] along with ( |49] l show that 

lim min —\\Ptruc{j)ej\\^ ^ 1. (59) 
Taking the limit (in probability) of Smin, 



lim mi 

n— >oo ^ 



= liminf min — 

n-!-CX> je/truo 7 



(a) 



lim inf min 

ri-i-OC je/truo 



|x,f ||Pt.ue(j)a,-|P 

7l|e,|P 



= lim inf min 
= lim inf min 

> lim inf min 



7Cr2(j) 
7Cr2(j) 



00 je/truo 7CT2(j) 



> L 



(60) 



where (a) follows from (l56T l: (b) follows from (ISST i and 
(c) follows from (EB; (d) follows from Lemma |9l and (e) 
follows from (l23l l. 

We next consider s„iax- Conditional on Ptruc(i), the vec- 
tors Ptruo(i)aj and are independent spherically-symmetric 
Gaussians in the range space of PtruoO)- It follows from 



Lemma |4] that each zj is a Beta(l,mj — 1) random variable. 
Since there are asymptotically Art elements in /true, Lemma |5] 
along with (|48] | and (|49] | show that 



m — An 
lim sup— — 77-^ s„ 
„_j.oo 21og(An) 



m — An 

lim sup—- — 7- — r max Zj <1. 
n^oo 2 log(An) je/truo 

(61) 



The above analysis shows that for any j e /true, 

liminf min i^/sl — ^/zi) 

n^oo je/,_ ^ ^ ^' 

(a) 1 

> lim inf —— (i/Smin - V^max) 



> lim inf -1/7 



2 log(An) 
rn — An 



> lim inf . 

> lim inf 

n— f 00 



1 + S 



2 log(An) 
m — Xn 



(.) ^.^.^ 2(l + .)log(n(l-A)) 



= lim inf \ / -— ^ > y/l + e 

n-s-cx3 V 1 + e 



(62) 



where (a) follows from the definitions of Snun and Sj^ax; (b) 
follows from (|60] l and ( |6T1 ); (c) follows from ( |26] i; (d) follows 
from ( fTsT i; (e) follows from (|44] |; and (f) follows from ( l43T l. 
Therefore, starting with (l55T l. 

lim ml mm 

. . . . 1 Sj -"^y/zjsj +Zj 

> lim ml mm 



>00 je/truo /i Sj + 2y/ZjSj + 1 

limmt mm — - 



n^oo je/truo /i Sj + 2^ZjSj + 1 

> limmt mm 

n^oo je/truo Sj + 2^ZjSj 

r • f • 1 + g 

> limmt mm 

n^oo je/fuo Sj + + 1 



1 



> liminf min 

n^oo je/truo Smin 



1 + e 



l + e ( 



> liminf min 



1 

1 + 



where (a) follows from dSSl l: (b) follows from ( l62b : (c) follows 
from the fact that Zj E [0, 1] (it is a Beta distributed random 
variable); (d) follows from ( I6OI 1: and (e) follows from the 
condition of the hypothesis of the theorem that 7^-0. This 
proves the first requirement, condition dTIT i. 

E. False Alarm Probability 

Now consider any index j ^ /true- This implies that xj — 
and therefore ( BTI ) shows that 

Ptrue(j)y = ej. 
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Hence from ( [39] l. 

Ptruc(i) 



(63) 



l|J^truo(j)a||2||«jir 

where Zj is defined in (|57] |. From the discussion above, 
each Zj has the Beta(2, raj — 2) distribution. Since there are 
asymptotically (1 — X)n elements in I^^^^,, the conditions (l48T l 
and ( |49] l along with Lemma |5] show that the limit 

TO — An 



limsup max 



j^/t„e 21og(n(l-A)) ' 
holds in probability. Therefore, 

limsup max -ptruc(i) 

1 



z,- < 1 



(64) 



(a) 



limsup max 



(b) m — Xn 

= limsup max ^ — —Zj 

(l + e)log(n(l-A)) ' 



< 



1 



1 + e 

where (a) follows from (l63] l: (b) follows from (l44l) : and (c) 
follows from ( |64| |. This proves (l42l) and thus completes the 
proof of the theorem. 
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